SCALE: A Scalable Language Engineering Toolkit
نویسندگان
چکیده
In this paper we present SCALE, a new Python toolkit that contains two extensions to n-gram language models. The first extension is a novel technique to model compound words called Semantic Head Mapping (SHM). The second extension, Bag-of-Words Language Modeling (BagLM), bundles popular models such as Latent Semantic Analysis and Continuous Skip-grams. Both extensions scale to large data and allow the integration into first-pass ASR decoding. The toolkit is open source, includes working examples and can be found on http://github.com/jorispelemans/scale.
منابع مشابه
Vine Toolkit - Towards portal based production solutions for scientific and engineering communities with grid-enabled resources support
In large scale production and scientific, academic environments, the information sets to perform computations on come from various sources. In particular, some computations may require the information obtained as a result of previous computations. Workflow description offers an attractive approach to formally deal with such complex processes. Vine Toolkit [1] solution addresses some major chall...
متن کاملUsing GridSim Toolkit for Supercharging Grid Computing Education
Numerous research groups in universities, research labs, and industries around the world are now working on Computational Grids or simply Grids that enable aggregation of distributed resources for solving largescale data intensive problems in science, engineering, and commerce. Several institutions and universities already started research and teaching programs on Grid computing as part of thei...
متن کاملXMLTK: An XML Toolkit for Scalable XML Stream Processing
We describe a toolkit for highly scalable XML data processing, consisting of two components. The first is a collection of stand-alone XML tools, s.a. sorting, aggregation, nesting, and unnesting, that can be chained to express more complex restructurings. The second is a highly scalable XPath processor for XML streams that can be used to develop scalable solutions for XML stream applications. I...
متن کاملNeuroPigPen: A Scalable Toolkit for Processing Electrophysiological Signal Data in Neuroscience Applications Using Apache Pig
The recent advances in neurological imaging and sensing technologies have led to rapid increase in the volume, rate of data generation, and variety of neuroscience data. This "neuroscience Big data" represents a significant opportunity for the biomedical research community to design experiments using data with greater timescale, large number of attributes, and statistically significant data siz...
متن کاملVirtual Large-Scale Disk Base on PC Grid
With the recent flood of data, one of the major issues is the storage thereof. Although commodity HDDs are now very cheap, appliance storage systems are still relatively expensive. As a result, we developed the VLSD (Virtual Large-Scale Disk) toolkit to assist in the construction of large-scale storage using only cheap commodity hardware and software. As an experiment in using the VLSD toolkit,...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016